Mining Biclusters of Similar Values with Triadic Concept Analysis
نویسندگان
چکیده
Biclustering numerical data became a popular data-mining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address a complete, correct and non redundant enumeration of such patterns, which is a well-known intractable problem, while no formal framework exists. In this paper, we introduce important links between biclustering and formal concept analysis. More specifically, we originally show that Triadic Concept Analysis (TCA), provides a nice mathematical framework for biclustering. Interestingly, existing algorithms of TCA, that usually apply on binary data, can be used (directly or with slight modifications) after a preprocessing step for extracting maximal biclusters of similar values.
منابع مشابه
Three Interrelated FCA Methods for Mining Biclusters of Similar Values on Columns
Biclustering numerical data tables consists in detecting particular and strong associations between both subsets of objects and attributes. Such biclusters are interesting since they model the data as local patterns. Whereas there exists several definitions of biclusters, depending on the constraints they should respect, we focus in this paper on biclusters of similar values on columns. There a...
متن کاملExtraction de biclusters à valeurs similaires avec l’analyse de concepts triadiques
Biclustering numerical data became a popular datamining task in the beginning of 2000’s, especially for analysing gene expression data. A bicluster reflects a strong association between a subset of objects and a subset of attributes in a numerical object/attribute data-table. So called biclusters of similar values can be thought as maximal sub-tables with close values. Only few methods address ...
متن کاملDNA Microarray Data Analysis: A Novel Biclustering Algorithm Approach
Biclustering algorithms refer to a distinct class of clustering algorithms that perform simultaneous row-column clustering. Biclustering problems arise in DNAmicroarray data analysis, collaborative filtering, market research, information retrieval, text mining, electoral trends, exchange analysis, and so forth. When dealing with DNA microarray experimental data for example, the goal of bicluste...
متن کاملEnumerating all maximal biclusters in numerical datasets
Biclustering has proved to be a powerful data analysis technique due to its wide success in various application domains. However, the existing literature presents efficient solutions only for enumerating maximal biclusters with constant values, or heuristic-based approaches which can not find all biclusters or even support the maximality of the obtained biclusters. Here, we present a general fa...
متن کاملEfficient mining of maximal biclusters in mixed-attribute datasets
This paper presents a novel enumerative biclustering algorithm to directly mine all maximal biclusters in mixed-attribute datasets, with or without missing values. The independent attributes are mixed or heterogeneous, in the sense that both numerical (real or integer values) and categorical (ordinal or nominal values) attribute types may appear together in the same dataset. The proposal is an ...
متن کامل